Setup

Please complete this R-markdown document with your group by answering the questions in intuit-quickbooks.pdf on Dropbox (week6/readings/). Create an HTML file with all your results and comments and push both the Rmarkdown and HTML file to GitLab when your team is done. All results MUST be reproducible (i.e., the TA and I must be able to recreate the HTML from the Rmarkdown file without changes or errors). This means that you should NOT use any R-packages that are not part of the rsm-msba-spark docker container.

This is the first group assignment for MGTA 455 and you will be using git and GitLab. If two people edit the same file at the same time you could get what is called a “merge conflict”. git will not decide for you who’s change to accept so the team-lead will have to determine which edits to use. To avoid merge conflicts, always click “pull” in Rstudio before you start working on a files. Then, when you are done, save and commit your changes, and then push them to GitLab. Make this a habit!

If multiple people are going to work on the assignment at the same time I recommend you work on different files. You can use source to include R-code in your Rmarkdown document or include other R(markdown) documents into the main assignment file.

Group work-flow tips as discussed during ICT in Summer II are shown below:

A graphical depiction of the group work-flow is shown below:

Additional resource on the use of git are linked below:

Question answers

Part I: Exploratory Data Analysis:

   

First, let’s see the correlations among all the variables

 

   

Next, we want to explore how response rate varies in terms of each feature we’re interested in.

   

We can reach the following conclusions based on the EDA output:
 

  1. The customers in zip_bins = 1 have obviously higher response rate than other zip bin customers.

  2. The variables “sex” and “bizflag” have no significant effect on “res1”.

  3. Based on the “Recency, Frequency and Monetary” framework, we are more interested in how recent a customer made the purchase other than how early he become made the first order; we hence pick “last” over “sincepurch” to be used in the model.

  4. Also based on the “Recency, Frequency and Monetary” framework, we are more concerned about how frequent a customer put orders with the company than how much money he has spent; besides, there’s a strong relationship between the two features as usually the more frequent customers buy the more money customers spent in total. Therefore, we think it is only necessary to use one of them - “numords” - in our model.

Based on the finding that customers in zip_bines = 1 have obviously higher response rate than other zip bin customers, we think it worth trying to give more weight to customers in this zip bin when predicting purchase probability. A new variable “zip_one” is created to enable the model to do so.

 

  id   zip zip_bins zip_one
1  1 94553       18   FALSE
2  2 53190       10   FALSE
3  3 37091        8   FALSE
4  4 02125        1    TRUE
5  5 60201       11   FALSE
6  6 12309        3   FALSE

 

We then further investigated to see why bin1 has such high response rate. Breaking-down the actual zip code, we try to uncover the truth by exploring response rate by state(represented by first 3-digit of zip). Finally, we find that Virginia whose zip-code start with “008” has 1891 responses with response rate of 0.398, much higher than the other states. Thus, we create another new variable “VI”.

 

  id   zip zip_bins    VI
1  1 94553       18 FALSE
2  2 53190       10 FALSE
3  3 37091        8 FALSE
4  4 02125        1 FALSE
5  5 60201       11 FALSE
6  6 12309        3 FALSE

       


PART II Initial Model Development

   

Based one the cost and margin given, we use the break-even rate of 0.024 as the cut-off to lable whether Intuit should mail to in the second round or not.
 
 

 
 

Sequential RFM model

 

We indexing the rfm-id to every cutsomer in the dataset, and split the data into training and validation set. Using break-even as cut-off on the training set, we filtered out the profitable rfm-id to mail to in the validation set.

 

partial RFM-id

 [1] "142" "354" "354" "154" "224" "232" "453" "452" "311" "124"

 
 

Evaulate the performance of sequential RFM model

 

Training

Based on our analysis, the number of customers Intuit should mail is 42,842 that is 81.60% of the customers.
The response rate for the selected customers is predicted to be 5.40%, or, 2,312 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $138,720.00; while actual margin is $149,880.00.
The expected profit is $78,313. The messaging cost is estimated to be $60,407 with a ROME of 1.30.

 

Validation

Based on our analysis, the number of customers Intuit should mail is 18,540 that is 82.40% of the customers.
The response rate for the selected customers is predicted to be 5.36%, or, 994 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $59,640.00; while actual margin is $66,180.00.
The expected profit is $33,499. The messaging cost is estimated to be $26,141 with a ROME of 1.28.  
 

Logistic Regression

 

Uncertain about whether to choose “zip_one”, which weighs more on all customers in zip_bin 1, or “VI”, which weighs more on only customer in the state of Virginia area, we decided to build 2 models for each variable then pick the one that contributes more profit in the validation set.

 

For both models, we re-estimate them 100 times, each time with a different bootstrap sample of the data; then calculate the 5th percentile of the predictions to use as the lower bound on the estimated probability.
 
 

Logistic Regression A

with “zip_one”, “numords”, “last”, “version1”, “owntaxprod”, “upgraded” as predictors.

 

Logistic regression (GLM)
Data                 : train
Response variable    : res1
Level                : Yes in res1
Explanatory variables: zip_one, numords, last, version1, owntaxprod, upgraded 
Null hyp.: there is no effect of x on res1
Alt. hyp.: there is an effect of x on res1

                 OR coefficient std.error z.value p.value    
 (Intercept)             -3.712     0.063 -58.542  < .001 ***
 zip_one|TRUE 7.526       2.018     0.055  36.799  < .001 ***
 numords      1.313       0.272     0.016  17.451  < .001 ***
 last         0.957      -0.044     0.002 -18.144  < .001 ***
 version1     2.178       0.778     0.053  14.791  < .001 ***
 owntaxprod   1.361       0.308     0.103   3.005   0.003 ** 
 upgraded     2.698       0.993     0.051  19.591  < .001 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared: 0.112
Log-likelihood: -8923.029, AIC: 17860.057, BIC: 17922.137
Chi-squared: 2243.587 df(6), p.value < .001 
Nr obs: 52,500 

 
 

Bootstrap sample predcition on the training set

  id pred_logit1 pred_logit2 pred_logit3 pred_logit4 pred_logit5 pred_logit6 pred_logit7 pred_logit8 pred_logit9 pred_logit10 pred_logit11 pred_logit12 pred_logit13 pred_logit14 pred_logit15 pred_logit16 pred_logit17 pred_logit18 pred_logit19
1  1  0.03358735  0.03343499  0.03200460  0.03571395  0.03151038  0.03298835  0.03267354  0.03258440  0.03173267   0.03092486   0.03173799   0.03362969   0.03175636   0.03213912   0.03261412   0.03124107   0.03276036   0.03309290   0.03242072
2  4  0.09892730  0.09685813  0.10862200  0.11227474  0.10475323  0.10236350  0.10103218  0.09245060  0.11279121   0.10408217   0.09943008   0.10620199   0.10963440   0.10265061   0.09649799   0.09900911   0.10601267   0.10419287   0.10721082
3  6  0.03944517  0.03836045  0.03831333  0.04089123  0.03812294  0.03873628  0.03815849  0.03831863  0.04279523   0.03794184   0.03823897   0.04114587   0.03822194   0.03959844   0.03716085   0.03851944   0.04050187   0.03772068   0.03990839
4  8  0.05752119  0.05490139  0.05555257  0.05592480  0.05739641  0.05541309  0.05482406  0.05468170  0.05163860   0.05332810   0.05719358   0.05203597   0.05176091   0.05409443   0.05514705   0.05645480   0.05796241   0.05816801   0.05130621
5 10  0.03400503  0.03539782  0.03376988  0.03681382  0.03326006  0.03387209  0.03340510  0.03498150  0.03357375   0.03353869   0.03341905   0.03500160   0.03391941   0.03492884   0.03418705   0.03291307   0.03489916   0.03415724   0.03479427
6 11  0.11767530  0.09495415  0.09775960  0.10044154  0.09770223  0.08749662  0.11636045  0.09904646  0.09899641   0.10612800   0.10252201   0.08700145   0.09749312   0.10693396   0.10840201   0.10533753   0.11882828   0.10940311   0.10143081
  pred_logit20 pred_logit21 pred_logit22 pred_logit23 pred_logit24 pred_logit25 pred_logit26 pred_logit27 pred_logit28 pred_logit29 pred_logit30 pred_logit31 pred_logit32 pred_logit33 pred_logit34 pred_logit35 pred_logit36 pred_logit37 pred_logit38
1   0.03360169   0.03342255   0.03017989   0.03225833   0.03376076   0.03358016   0.03312976   0.03456998   0.03331778   0.03214572   0.03115417   0.03171066   0.03321784   0.03162812   0.03172261   0.03166638   0.03174887   0.03312649   0.03498406
2   0.09618242   0.09784495   0.10134082   0.10915971   0.10473739   0.09870573   0.09950667   0.09925290   0.10060007   0.09781975   0.09139115   0.08960184   0.10410763   0.09949996   0.10561621   0.08909057   0.09444601   0.10270552   0.10900708
3   0.04203303   0.03955244   0.03802470   0.03756124   0.04124930   0.03851684   0.03807014   0.03905901   0.03613596   0.03909482   0.04036432   0.04170117   0.03901813   0.04061042   0.03857037   0.03782003   0.03865271   0.03845109   0.04085207
4   0.06009205   0.05948448   0.05113757   0.05381577   0.06168292   0.05730441   0.05571241   0.05845852   0.05850964   0.05607299   0.05330788   0.05508270   0.05453207   0.05293310   0.04980663   0.05526426   0.05445610   0.06042289   0.05847857
5   0.03482681   0.03481708   0.03243509   0.03384065   0.03417885   0.03497442   0.03477175   0.03600471   0.03555958   0.03342703   0.03246905   0.03344232   0.03527607   0.03389408   0.03404260   0.03377416   0.03350637   0.03387732   0.03608627
6   0.10076778   0.11676914   0.11844525   0.11392090   0.11193128   0.11210072   0.10415923   0.09419045   0.09962881   0.10407987   0.10037762   0.11050403   0.10087167   0.11183555   0.09161331   0.10828906   0.10427878   0.09351420   0.10292671
  pred_logit39 pred_logit40 pred_logit41 pred_logit42 pred_logit43 pred_logit44 pred_logit45 pred_logit46 pred_logit47 pred_logit48 pred_logit49 pred_logit50 pred_logit51 pred_logit52 pred_logit53 pred_logit54 pred_logit55 pred_logit56 pred_logit57
1   0.03192181   0.03111434   0.03307356   0.03121448   0.03179671   0.03134725   0.03275114   0.03214496   0.03193610   0.03026263   0.03437335   0.03203878   0.03344493   0.03190571   0.03195365   0.03263580   0.03327844   0.03134953   0.03476713
2   0.10514184   0.10770448   0.09625668   0.10311312   0.10350101   0.09829631   0.10380479   0.09687180   0.10422864   0.09023249   0.09558007   0.09774254   0.10964365   0.10500497   0.10540425   0.10109993   0.10157117   0.11143560   0.10437592
3   0.03785396   0.03834380   0.03650917   0.03930713   0.03771426   0.03764433   0.04191299   0.03863792   0.03640732   0.03711005   0.03788961   0.03825732   0.04117879   0.03735159   0.04136052   0.03950804   0.03939973   0.03950037   0.03921417
4   0.05673117   0.05654252   0.05909550   0.05204238   0.05010853   0.05555217   0.05652313   0.05400452   0.05527998   0.05011076   0.05578579   0.05822234   0.05728011   0.05515525   0.05084441   0.05429963   0.05733588   0.05064022   0.05482031
5   0.03333347   0.03223780   0.03450158   0.03345001   0.03502497   0.03305271   0.03414026   0.03527874   0.03387113   0.03331194   0.03722145   0.03418383   0.03467863   0.03339249   0.03430114   0.03465831   0.03470626   0.03411437   0.03781715
6   0.10697371   0.11737986   0.11239050   0.09528845   0.09957441   0.11972951   0.09095333   0.10008211   0.09845680   0.10137152   0.08792852   0.09538499   0.12429862   0.08492451   0.09174744   0.10554804   0.10831872   0.09282312   0.09567182
  pred_logit58 pred_logit59 pred_logit60 pred_logit61 pred_logit62 pred_logit63 pred_logit64 pred_logit65 pred_logit66 pred_logit67 pred_logit68 pred_logit69 pred_logit70 pred_logit71 pred_logit72 pred_logit73 pred_logit74 pred_logit75 pred_logit76
1   0.03340193   0.03271358   0.03331971   0.03003253   0.03207013   0.03406553   0.03334371   0.03353190   0.03321215   0.03443992   0.03275002   0.03316669   0.03293044   0.03186004   0.03234653   0.03328785   0.03230649   0.03418716   0.03390423
2   0.10235657   0.09712234   0.10077913   0.09619979   0.10457113   0.10666186   0.10495059   0.10897592   0.10836144   0.10004217   0.09766725   0.10791776   0.10665301   0.09411241   0.09905749   0.10297051   0.09777549   0.10998732   0.10120357
3   0.04276788   0.03766560   0.04329743   0.03860128   0.03870533   0.04209403   0.04001882   0.04052196   0.04162548   0.03434634   0.04433010   0.03919750   0.03749206   0.03735436   0.03931006   0.03781060   0.04048488   0.03952046   0.04087243
4   0.05394494   0.05336793   0.05528443   0.05242703   0.05276817   0.05758850   0.05765777   0.05554282   0.06074249   0.05529036   0.05589514   0.05340643   0.05559222   0.05982647   0.05453115   0.05586996   0.05845316   0.05720915   0.05527906
5   0.03469197   0.03501473   0.03450698   0.03265618   0.03462493   0.03499038   0.03481976   0.03569723   0.03370576   0.03553736   0.03370263   0.03508908   0.03459526   0.03352932   0.03359375   0.03516258   0.03341000   0.03455110   0.03546748
6   0.10449683   0.10922486   0.11113457   0.11015977   0.08202078   0.11218035   0.11407839   0.08877344   0.10148935   0.09363220   0.09557057   0.08125575   0.11638676   0.10328255   0.10284304   0.08750877   0.09580041   0.09865412   0.09830448
  pred_logit77 pred_logit78 pred_logit79 pred_logit80 pred_logit81 pred_logit82 pred_logit83 pred_logit84 pred_logit85 pred_logit86 pred_logit87 pred_logit88 pred_logit89 pred_logit90 pred_logit91 pred_logit92 pred_logit93 pred_logit94 pred_logit95
1   0.03249916   0.03041233   0.03498574   0.03290439   0.03478074   0.03438494   0.03336764   0.03246492   0.03326585   0.03190305   0.03202012   0.03250477   0.03399285   0.03116250   0.03135435   0.03424789   0.03195650   0.03105678   0.03420177
2   0.10497714   0.09719571   0.10098616   0.10786736   0.10273591   0.09415642   0.10645524   0.09472521   0.10185731   0.10172508   0.09365286   0.11085653   0.10146840   0.10922842   0.09976176   0.10094709   0.10993496   0.10127617   0.10917262
3   0.03992427   0.03803762   0.04289231   0.04047922   0.03804201   0.03894480   0.04044895   0.04017884   0.04123533   0.03870504   0.03883663   0.04095661   0.03635607   0.03845381   0.03805486   0.03961099   0.03850332   0.03918992   0.04199156
4   0.05457890   0.05362412   0.05949923   0.05311953   0.05836490   0.05608415   0.05659303   0.05459948   0.05555296   0.05015334   0.05651478   0.05336711   0.05441401   0.05240477   0.05806933   0.05634334   0.05555237   0.05660495   0.05588053
5   0.03429647   0.03197207   0.03619542   0.03411702   0.03570971   0.03563696   0.03604631   0.03442522   0.03476191   0.03512364   0.03340416   0.03448610   0.03651155   0.03319157   0.03256544   0.03586235   0.03329931   0.03261598   0.03529162
6   0.10790672   0.08836210   0.10407717   0.11362830   0.09496434   0.10244357   0.10032091   0.08606739   0.09756716   0.10194297   0.10562473   0.11622582   0.11133252   0.10953969   0.10916164   0.09391923   0.09493934   0.10767729   0.10698543
  pred_logit96 pred_logit97 pred_logit98 pred_logit99 pred_logit100
1   0.03387045   0.03201353   0.03254618   0.03191574    0.03402187
2   0.09834642   0.09809528   0.10959218   0.10183175    0.11145276
3   0.03807393   0.03711101   0.03766931   0.04138363    0.04016083
4   0.05106565   0.05453202   0.05631773   0.05590196    0.05193316
5   0.03673516   0.03421272   0.03402775   0.03376708    0.03560627
6   0.09826648   0.10141694   0.09562299   0.10427162    0.10647115

 
 

Bootstrap sample predcition on the validation set

  id pred_val_logit1 pred_val_logit2 pred_val_logit3 pred_val_logit4 pred_val_logit5 pred_val_logit6 pred_val_logit7 pred_val_logit8 pred_val_logit9 pred_val_logit10 pred_val_logit11 pred_val_logit12 pred_val_logit13 pred_val_logit14
1  2      0.02759774      0.02653257      0.02592432      0.02901585      0.02513120      0.02690952      0.02668875      0.02544153      0.02569974       0.02432149       0.02541437       0.02723609       0.02539422       0.02521562
2  3      0.09111374      0.09641764      0.08868128      0.09652013      0.09411716      0.09048192      0.08921644      0.10098606      0.09812846       0.09558083       0.09315198       0.09690039       0.09178538       0.10029707
3  5      0.03026197      0.02894857      0.02804265      0.03171018      0.02739197      0.02940621      0.02920810      0.02778903      0.02776422       0.02633426       0.02767885       0.02966796       0.02747774       0.02730645
4  7      0.03829394      0.03735933      0.03929685      0.03775026      0.03928853      0.03744354      0.03679787      0.03704020      0.03669559       0.03755830       0.03927954       0.03562245       0.03652858       0.03806908
5  9      0.01656117      0.01637529      0.01678601      0.01773840      0.01560045      0.01646173      0.01619318      0.01560729      0.01675975       0.01566653       0.01584487       0.01696067       0.01641542       0.01622749
6 12      0.16479032      0.17491747      0.15325816      0.17307200      0.16948392      0.16228148      0.16100501      0.18570568      0.16739866       0.16903918       0.16677853       0.17156884       0.16017410       0.17724028
  pred_val_logit15 pred_val_logit16 pred_val_logit17 pred_val_logit18 pred_val_logit19 pred_val_logit20 pred_val_logit21 pred_val_logit22 pred_val_logit23 pred_val_logit24 pred_val_logit25 pred_val_logit26 pred_val_logit27 pred_val_logit28
1       0.02603507       0.02520490       0.02607309       0.02670999       0.02584512       0.02725129       0.02692115       0.02385406       0.02573763       0.02809716       0.02683537       0.02659113       0.02806887       0.02621418
2       0.09250969       0.09078995       0.09931813       0.09159129       0.09625132       0.09886830       0.09537941       0.09495421       0.09352262       0.09009546       0.09618876       0.09237743       0.09125510       0.09339953
3       0.02846351       0.02734043       0.02831724       0.02926695       0.02794213       0.02972612       0.02939194       0.02588179       0.02813455       0.03061357       0.02941739       0.02897368       0.03052580       0.02860769
4       0.03719906       0.03946598       0.04030331       0.03887606       0.03633248       0.04099162       0.04041535       0.03566616       0.03631079       0.04230056       0.03820943       0.03814748       0.04038047       0.03982826
5       0.01588851       0.01607070       0.01650840       0.01609644       0.01678442       0.01683816       0.01655324       0.01518997       0.01571937       0.01747066       0.01613175       0.01653405       0.01763537       0.01615895
6       0.16897403       0.15902319       0.17577362       0.16752954       0.16724283       0.17559624       0.17152222       0.16874544       0.17061313       0.15694210       0.17704038       0.16561242       0.16095239       0.17128860
  pred_val_logit29 pred_val_logit30 pred_val_logit31 pred_val_logit32 pred_val_logit33 pred_val_logit34 pred_val_logit35 pred_val_logit36 pred_val_logit37 pred_val_logit38 pred_val_logit39 pred_val_logit40 pred_val_logit41 pred_val_logit42
1       0.02589145       0.02527844       0.02559415       0.02647957       0.02498216       0.02526849       0.02486961       0.02521807       0.02686725       0.02858792       0.02576532       0.02511644       0.02635743       0.02466858
2       0.09450185       0.09406676       0.09757778       0.09570921       0.10180600       0.09347764       0.09818317       0.09702917       0.09258334       0.09397680       0.09031206       0.09232412       0.09228510       0.09849298
3       0.02829379       0.02749083       0.02774262       0.02878090       0.02715404       0.02733060       0.02717432       0.02754531       0.02950438       0.03114110       0.02806710       0.02746611       0.02891045       0.02680636
4       0.03791380       0.03680339       0.03860805       0.03774001       0.03663196       0.03519350       0.03739010       0.03688589       0.04001703       0.04009325       0.03891325       0.03813092       0.03933393       0.03605063
5       0.01584073       0.01588839       0.01638389       0.01669315       0.01574966       0.01637129       0.01522642       0.01546954       0.01599264       0.01779760       0.01604380       0.01530831       0.01579369       0.01557466
6       0.17064483       0.16533539       0.16941479       0.17027888       0.18143994       0.16315384       0.18034472       0.17648411       0.17010348       0.16567110       0.16116818       0.16713797       0.17125619       0.17573971
  pred_val_logit43 pred_val_logit44 pred_val_logit45 pred_val_logit46 pred_val_logit47 pred_val_logit48 pred_val_logit49 pred_val_logit50 pred_val_logit51 pred_val_logit52 pred_val_logit53 pred_val_logit54 pred_val_logit55 pred_val_logit56
1       0.02462894       0.02527370       0.02644689       0.02509911       0.02526560       0.02343864       0.02678836       0.02547982       0.02725433       0.02575460       0.02541963       0.02580650       0.02679188       0.02475314
2       0.09915443       0.08898816       0.09952825       0.09863602       0.09263159       0.09767316       0.09964583       0.09416684       0.09498714       0.08885775       0.10033983       0.10010213       0.09506608       0.09721506
3       0.02666163       0.02741264       0.02882806       0.02711183       0.02758382       0.02538318       0.02916057       0.02766146       0.02965202       0.02802212       0.02750762       0.02816285       0.02924135       0.02670308
4       0.03528385       0.03883800       0.03864637       0.03843512       0.03753485       0.03523880       0.03835942       0.04057133       0.03949249       0.03801312       0.03585738       0.03691574       0.03898527       0.03621946
5       0.01588235       0.01612235       0.01640802       0.01638081       0.01554058       0.01508330       0.01674562       0.01617105       0.01708651       0.01614351       0.01642388       0.01590832       0.01650375       0.01627379
6       0.17698752       0.15613714       0.17669755       0.17339236       0.16939450       0.17474853       0.18129838       0.16714475       0.16669875       0.15794476       0.17472082       0.18163965       0.17082856       0.16848740
  pred_val_logit57 pred_val_logit58 pred_val_logit59 pred_val_logit60 pred_val_logit61 pred_val_logit62 pred_val_logit63 pred_val_logit64 pred_val_logit65 pred_val_logit66 pred_val_logit67 pred_val_logit68 pred_val_logit69 pred_val_logit70
1       0.02702610       0.02706494       0.02586453       0.02736635       0.02367230       0.02544525       0.02798181       0.02697175       0.02684938       0.02760112       0.02738007       0.02717443       0.02640728       0.02617523
2       0.10337838       0.10056054       0.09472380       0.09653117       0.09582537       0.09455336       0.09509809       0.09441269       0.09703425       0.09107222       0.08948152       0.09538840       0.09711261       0.09470238
3       0.02939049       0.02950523       0.02811672       0.02967498       0.02556913       0.02749184       0.03046650       0.02934867       0.02908170       0.03005768       0.03023400       0.02940999       0.02877409       0.02864630
4       0.03784527       0.03682297       0.03690412       0.03865927       0.03732352       0.03749912       0.03956412       0.03973772       0.03904585       0.04174608       0.03565071       0.03943480       0.03653900       0.03733063
5       0.01698639       0.01677916       0.01629266       0.01747749       0.01545679       0.01658559       0.01746739       0.01689684       0.01725592       0.01721241       0.01580226       0.01754303       0.01641778       0.01588214
6       0.18711365       0.17800769       0.16990950       0.16567872       0.16730355       0.16469060       0.16634631       0.16690184       0.16929105       0.15831627       0.17181111       0.16105742       0.17458059       0.17404548
  pred_val_logit71 pred_val_logit72 pred_val_logit73 pred_val_logit74 pred_val_logit75 pred_val_logit76 pred_val_logit77 pred_val_logit78 pred_val_logit79 pred_val_logit80 pred_val_logit81 pred_val_logit82 pred_val_logit83 pred_val_logit84
1       0.02534706       0.02614126       0.02650569       0.02628417       0.02811154       0.02728276       0.02610767       0.02447063       0.02864844       0.02677790       0.02816921       0.02759987       0.02654903       0.02588478
2       0.09371033       0.09383981       0.09406251       0.09437780       0.09126139       0.09822592       0.09547927       0.09094428       0.09714566       0.09395570       0.09180520       0.09614294       0.09742997       0.09844532
3       0.02770306       0.02853647       0.02890286       0.02865750       0.03084131       0.02973833       0.02835655       0.02660795       0.03112818       0.02915371       0.03089488       0.03026349       0.02863592       0.02815253
4       0.04048608       0.03702278       0.03812753       0.03995226       0.03798555       0.03777180       0.03789818       0.03706459       0.04126830       0.03646870       0.03882226       0.03732269       0.04055178       0.03768820
5       0.01549576       0.01608789       0.01641050       0.01628582       0.01681971       0.01692753       0.01652453       0.01539619       0.01808845       0.01672462       0.01688362       0.01656494       0.01746673       0.01626095
6       0.17118481       0.16831873       0.17009484       0.16737214       0.16534943       0.17516855       0.16823028       0.16124045       0.16866623       0.16566414       0.16822182       0.17658535       0.16778388       0.17511906
  pred_val_logit85 pred_val_logit86 pred_val_logit87 pred_val_logit88 pred_val_logit89 pred_val_logit90 pred_val_logit91 pred_val_logit92 pred_val_logit93 pred_val_logit94 pred_val_logit95 pred_val_logit96 pred_val_logit97 pred_val_logit98
1       0.02694283       0.02482226       0.02577203       0.02619499       0.02649842       0.02491550       0.02547709       0.02720430       0.02587456       0.02497728       0.02774405       0.02651665       0.02525391       0.02614577
2       0.09650649       0.09979967       0.09375603       0.09579760       0.09683264       0.09269506       0.08888298       0.10056058       0.09084447       0.09391203       0.09934767       0.09796332       0.09444933       0.09157657
3       0.02928866       0.02681775       0.02812756       0.02833005       0.02895971       0.02700002       0.02773503       0.02983139       0.02817159       0.02717942       0.03032801       0.02877610       0.02750534       0.02853108
4       0.03842741       0.03562655       0.03843691       0.03775355       0.03674671       0.03675177       0.03996911       0.03749599       0.03817544       0.03902355       0.03769740       0.03554859       0.03741343       0.03832052
5       0.01697056       0.01618390       0.01587981       0.01697877       0.01620255       0.01597357       0.01592253       0.01632344       0.01615784       0.01564793       0.01693971       0.01686286       0.01574116       0.01612308
6       0.16940636       0.17590429       0.16858345       0.16523978       0.17955437       0.16265949       0.15742081       0.18557190       0.16125555       0.16674144       0.17799121       0.17548667       0.17110941       0.16518095
  pred_val_logit99 pred_val_logit100
1       0.02552892        0.02736826
2       0.10013984        0.09670041
3       0.02775203        0.02982992
4       0.03869260        0.03545315
5       0.01608141        0.01698553
6       0.17698601        0.17269543

 
 

Get 5th percentile lower-bound probability

 
 

Trainging predcition

# A tibble: 6 x 2
     id mailto_wave2
  <int> <lgl>       
1     1 TRUE        
2     4 TRUE        
3     6 TRUE        
4     8 TRUE        
5    10 TRUE        
6    11 TRUE        

 
 

Validation prediction

# A tibble: 6 x 2
     id rfm_resp
  <int>    <dbl>
1     2   0.0462
2     3   0.0693
3     5   0.0505
4     7   0.0248
5     9   0.0205
6    12   0.144 

 
   

Logistic Regression B

with “VI”, “numords”, “last”, “version1”, “owntaxprod”, “upgraded” as predictors.

 
 

Logistic regression (GLM)
Data                 : train
Response variable    : res1
Level                : Yes in res1
Explanatory variables: VI, numords, last, version1, owntaxprod, upgraded 
Null hyp.: there is no effect of x on res1
Alt. hyp.: there is an effect of x on res1

                 OR coefficient std.error z.value p.value    
 (Intercept)             -3.776     0.065 -58.142  < .001 ***
 VI|TRUE     20.517       3.021     0.065  46.461  < .001 ***
 numords      1.339       0.292     0.016  18.291  < .001 ***
 last         0.955      -0.046     0.002 -18.390  < .001 ***
 version1     2.258       0.815     0.054  15.094  < .001 ***
 owntaxprod   1.384       0.325     0.104   3.114   0.002 ** 
 upgraded     2.844       1.045     0.052  20.130  < .001 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared: 0.144
Log-likelihood: -8594.701, AIC: 17203.403, BIC: 17265.483
Chi-squared: 2900.241 df(6), p.value < .001 
Nr obs: 52,500 

 

Bootstrap sample predcition on the training set

  id pred_logit1 pred_logit2 pred_logit3 pred_logit4 pred_logit5 pred_logit6 pred_logit7 pred_logit8 pred_logit9 pred_logit10 pred_logit11 pred_logit12 pred_logit13 pred_logit14 pred_logit15 pred_logit16 pred_logit17 pred_logit18 pred_logit19
1  1  0.03255364  0.03240355  0.03092583  0.03463464  0.03040415  0.03174203  0.03175792  0.03124129  0.03044998   0.02963682   0.03045392   0.03271192   0.03116484   0.03142775   0.03195773   0.03050697   0.03192694   0.03218207   0.03153167
2  4  0.01393851  0.01401335  0.01454437  0.01514301  0.01320250  0.01375657  0.01376026  0.01301222  0.01435986   0.01332014   0.01335863   0.01449886   0.01419049   0.01412621   0.01371698   0.01398539   0.01440709   0.01359692   0.01442638
3  6  0.03814915  0.03699220  0.03810736  0.04023595  0.03728228  0.03729778  0.03720487  0.03695550  0.04190864   0.03717999   0.03709756   0.04101236   0.03702016   0.03871725   0.03664923   0.03825281   0.04077682   0.03711319   0.03904759
4  8  0.05648616  0.05335497  0.05423993  0.05588224  0.05768392  0.05438269  0.05425305  0.05364427  0.05087367   0.05293504   0.05728093   0.05070596   0.05081839   0.05384368   0.05391388   0.05659069   0.05668984   0.05677823   0.05130205
5 10  0.03327651  0.03488965  0.03312637  0.03603131  0.03246416  0.03322403  0.03276011  0.03410024  0.03269297   0.03255651   0.03246666   0.03444978   0.03372893   0.03446618   0.03388593   0.03216208   0.03437519   0.03359114   0.03402267
6 11  0.12272292  0.09577005  0.10122000  0.10696755  0.10044470  0.09221386  0.12026857  0.10281456  0.10543116   0.10830848   0.10217258   0.09022829   0.10074825   0.10805934   0.11138847   0.11211335   0.12340388   0.11607161   0.10309759
  pred_logit20 pred_logit21 pred_logit22 pred_logit23 pred_logit24 pred_logit25 pred_logit26 pred_logit27 pred_logit28 pred_logit29 pred_logit30 pred_logit31 pred_logit32 pred_logit33 pred_logit34 pred_logit35 pred_logit36 pred_logit37 pred_logit38
1   0.03267424   0.03210105   0.02976113   0.03070489   0.03272348   0.03264040   0.03212925   0.03364088   0.03207686   0.03116829   0.03020921   0.03073130   0.03204190   0.03112251   0.03082857   0.03033653   0.03091250   0.03206412   0.03399127
2   0.01446669   0.01394543   0.01318786   0.01324514   0.01501470   0.01353336   0.01416853   0.01531217   0.01374225   0.01340300   0.01353882   0.01412609   0.01434087   0.01371995   0.01418725   0.01279332   0.01319363   0.01356185   0.01527558
3   0.04104581   0.03810078   0.03728099   0.03717154   0.04078873   0.03740328   0.03730861   0.03841201   0.03550854   0.03806347   0.03937327   0.04117434   0.03864258   0.04040203   0.03786978   0.03707552   0.03776407   0.03777169   0.03982723
4   0.05976752   0.05831353   0.05080449   0.05336553   0.06018963   0.05630978   0.05530007   0.05675401   0.05795000   0.05596003   0.05277002   0.05450258   0.05354097   0.05223923   0.04898748   0.05463592   0.05410816   0.06046478   0.05836965
5   0.03414583   0.03390866   0.03214724   0.03258623   0.03382566   0.03428680   0.03401703   0.03554044   0.03481380   0.03277584   0.03171653   0.03258348   0.03458244   0.03345297   0.03351954   0.03269668   0.03274367   0.03302482   0.03531157
6   0.10484066   0.12047642   0.12159323   0.12166348   0.11242614   0.11408340   0.10776763   0.09682027   0.10566713   0.10478649   0.10064345   0.11177541   0.10394838   0.11424608   0.09316799   0.11223053   0.10558148   0.09607321   0.10647512
  pred_logit39 pred_logit40 pred_logit41 pred_logit42 pred_logit43 pred_logit44 pred_logit45 pred_logit46 pred_logit47 pred_logit48 pred_logit49 pred_logit50 pred_logit51 pred_logit52 pred_logit53 pred_logit54 pred_logit55 pred_logit56 pred_logit57
1   0.03057902   0.03055819   0.03192174   0.03013440   0.03091519   0.03036634   0.03197299   0.03110502   0.03102441   0.02915558   0.03299573   0.03133582   0.03270896   0.03093975   0.03113669   0.03132713   0.03253076   0.03000903   0.03379470
2   0.01363854   0.01297960   0.01330433   0.01318713   0.01381187   0.01387616   0.01397276   0.01404199   0.01323651   0.01302435   0.01420253   0.01382792   0.01469055   0.01402098   0.01430976   0.01330259   0.01408001   0.01387808   0.01462789
3   0.03722147   0.03705584   0.03589393   0.03845190   0.03728343   0.03681912   0.04069130   0.03792476   0.03562937   0.03644764   0.03713452   0.03741184   0.04072233   0.03697326   0.04028314   0.03857753   0.03831198   0.03846544   0.03812340
4   0.05476449   0.05535544   0.05704535   0.05087557   0.04980768   0.05455333   0.05621945   0.05252418   0.05441405   0.04881927   0.05448597   0.05744287   0.05632451   0.05466705   0.05065130   0.05312560   0.05717498   0.04920402   0.05335240
5   0.03255787   0.03205948   0.03383415   0.03282929   0.03429171   0.03243594   0.03362534   0.03470729   0.03330895   0.03260170   0.03629898   0.03373054   0.03418233   0.03288189   0.03372696   0.03375762   0.03422116   0.03347288   0.03707727
6   0.11095212   0.11704866   0.11646868   0.09522535   0.10152700   0.12579685   0.08893513   0.10140605   0.09986982   0.10557321   0.08983256   0.09577795   0.12924931   0.08761580   0.09401225   0.10582771   0.11061621   0.09280589   0.10017519
  pred_logit58 pred_logit59 pred_logit60 pred_logit61 pred_logit62 pred_logit63 pred_logit64 pred_logit65 pred_logit66 pred_logit67 pred_logit68 pred_logit69 pred_logit70 pred_logit71 pred_logit72 pred_logit73 pred_logit74 pred_logit75 pred_logit76
1   0.03218559   0.03137801   0.03224830   0.02932107   0.03113869   0.03335024   0.03231157   0.03219600   0.03235116   0.03390787   0.03179985   0.03173849   0.03138105   0.03112604   0.03167976   0.03173558   0.03110440   0.03330764   0.03257801
2   0.01435784   0.01393667   0.01512904   0.01362730   0.01429369   0.01505847   0.01445477   0.01476971   0.01474433   0.01344113   0.01512599   0.01390100   0.01329616   0.01320791   0.01364182   0.01384583   0.01389605   0.01427589   0.01438258
3   0.04240281   0.03684324   0.04293484   0.03847534   0.03815842   0.04138927   0.03913427   0.03974480   0.04022264   0.03340294   0.04290110   0.03895599   0.03690273   0.03697595   0.03779577   0.03793932   0.03999126   0.03870426   0.03961727
4   0.05252751   0.05213909   0.05422207   0.05136432   0.05113407   0.05600253   0.05640534   0.05483633   0.05958305   0.05514408   0.05497517   0.05287187   0.05486904   0.05883691   0.05456605   0.05552353   0.05776721   0.05608644   0.05429508
5   0.03386879   0.03402000   0.03365368   0.03223331   0.03415019   0.03456639   0.03420928   0.03487294   0.03330755   0.03513506   0.03322253   0.03393931   0.03342985   0.03295669   0.03308169   0.03396226   0.03267802   0.03409333   0.03475203
6   0.10588279   0.11238167   0.11355496   0.11294664   0.08726085   0.11652259   0.11738844   0.09053004   0.10551809   0.09640673   0.09655124   0.08417999   0.11923615   0.10740516   0.10446648   0.08938351   0.09952107   0.10229919   0.09777881
  pred_logit77 pred_logit78 pred_logit79 pred_logit80 pred_logit81 pred_logit82 pred_logit83 pred_logit84 pred_logit85 pred_logit86 pred_logit87 pred_logit88 pred_logit89 pred_logit90 pred_logit91 pred_logit92 pred_logit93 pred_logit94 pred_logit95
1   0.03158935   0.02955313   0.03384498   0.03246340   0.03359654   0.03297347   0.03243152   0.03118429   0.03309471   0.03121815   0.03091185   0.03150197   0.03311466   0.03036638   0.03062421   0.03317167   0.03069998   0.02991445   0.03277872
2   0.01405137   0.01334398   0.01534502   0.01446092   0.01429857   0.01384465   0.01521351   0.01391887   0.01468819   0.01386821   0.01350647   0.01457146   0.01372975   0.01389626   0.01378076   0.01377409   0.01365176   0.01345066   0.01407796
3   0.03863067   0.03767656   0.04204080   0.03972448   0.03716032   0.03825928   0.03966671   0.03972823   0.03995393   0.03766683   0.03825741   0.04007000   0.03569482   0.03821361   0.03770855   0.03824742   0.03776903   0.03880653   0.04120025
4   0.05370648   0.05215010   0.05858310   0.05266057   0.05803350   0.05526674   0.05577052   0.05320518   0.05644756   0.04978288   0.05527917   0.05263581   0.05537448   0.05040052   0.05719794   0.05569063   0.05566087   0.05533026   0.05420850
5   0.03368397   0.03159272   0.03536704   0.03400896   0.03482074   0.03461193   0.03548195   0.03353132   0.03441424   0.03452680   0.03266346   0.03375083   0.03557046   0.03287240   0.03217057   0.03515489   0.03230330   0.03184725   0.03444563
6   0.11148721   0.08942446   0.10421740   0.11570420   0.09760876   0.10456413   0.10077464   0.08844413   0.10292706   0.11046367   0.11101268   0.12451204   0.11476834   0.11388365   0.11061751   0.09958070   0.09864478   0.10988191   0.10807692
  pred_logit96 pred_logit97 pred_logit98 pred_logit99 pred_logit100
1   0.03282671   0.03069240   0.03155149   0.03059981    0.03313133
2   0.01460924   0.01327535   0.01361640   0.01368814    0.01464690
3   0.03733080   0.03600613   0.03658927   0.04095401    0.03985862
4   0.04964071   0.05356156   0.05622536   0.05463704    0.05177865
5   0.03625531   0.03333129   0.03335821   0.03286440    0.03485856
6   0.10064428   0.10626658   0.09826718   0.10553530    0.11135773

 
 

Bootstrap sample prediction on the validation set

  id pred_val_logit1 pred_val_logit2 pred_val_logit3 pred_val_logit4 pred_val_logit5 pred_val_logit6 pred_val_logit7 pred_val_logit8 pred_val_logit9 pred_val_logit10 pred_val_logit11 pred_val_logit12 pred_val_logit13 pred_val_logit14
1  2      0.02625721      0.02516754      0.02459044      0.02769663      0.02378860      0.02520981      0.02550576      0.02380040      0.02418359       0.02286748       0.02391252       0.02599389       0.02439953       0.02428223
2  3      0.09406669      0.09944895      0.09328765      0.10017989      0.09814632      0.09496535      0.09228711      0.10549943      0.10213754       0.09962365       0.09646790       0.10277638       0.09547382       0.10328065
3  5      0.02892089      0.02752161      0.02664544      0.03036893      0.02602751      0.02765230      0.02802509      0.02609985      0.02618923       0.02483709       0.02613697       0.02841706       0.02650642       0.02637795
4  7      0.03687153      0.03594319      0.03808032      0.03718767      0.03884859      0.03615130      0.03578288      0.03571288      0.03577802       0.03678510       0.03874139       0.03415940       0.03523739       0.03737997
5  9      0.01537223      0.01534052      0.01577341      0.01662479      0.01445984      0.01510658      0.01513745      0.01428463      0.01556378       0.01447961       0.01461601       0.01586791       0.01542977       0.01535919
6 12      0.17556151      0.18481967      0.16430473      0.18388950      0.18174035      0.17618557      0.17130228      0.20021617      0.17825923       0.18064884       0.17764056       0.18666085       0.17193871       0.18674077
  pred_val_logit15 pred_val_logit16 pred_val_logit17 pred_val_logit18 pred_val_logit19 pred_val_logit20 pred_val_logit21 pred_val_logit22 pred_val_logit23 pred_val_logit24 pred_val_logit25 pred_val_logit26 pred_val_logit27 pred_val_logit28
1       0.02507262       0.02441075       0.02504433       0.02546352       0.02477239       0.02611225       0.02533007       0.02318727       0.02410014       0.02659251       0.02558810       0.02539557       0.02683301       0.02470677
2       0.09655236       0.09321918       0.10450487       0.09642753       0.09938478       0.10153589       0.09830088       0.09822797       0.09744244       0.09560100       0.10009642       0.09539130       0.09479190       0.09796894
3       0.02749118       0.02657917       0.02725170       0.02802237       0.02690584       0.02857566       0.02774673       0.02527578       0.02640736       0.02901732       0.02820023       0.02776240       0.02923248       0.02702241
4       0.03589776       0.03891672       0.03908647       0.03722341       0.03560631       0.04020452       0.03905017       0.03471152       0.03564631       0.04100912       0.03667057       0.03732617       0.03889856       0.03907826
5       0.01505707       0.01524219       0.01569200       0.01498162       0.01568314       0.01584934       0.01529296       0.01438897       0.01452863       0.01640159       0.01493375       0.01550566       0.01669910       0.01504609
6       0.18041406       0.16671340       0.18776777       0.18174701       0.17772427       0.18453587       0.18177917       0.17963806       0.18123314       0.17015658       0.19061303       0.17513057       0.17040326       0.18369326
  pred_val_logit29 pred_val_logit30 pred_val_logit31 pred_val_logit32 pred_val_logit33 pred_val_logit34 pred_val_logit35 pred_val_logit36 pred_val_logit37 pred_val_logit38 pred_val_logit39 pred_val_logit40 pred_val_logit41 pred_val_logit42
1       0.02462013       0.02412204       0.02449808       0.02503988       0.02431182       0.02410964       0.02340480       0.02423916       0.02563860       0.02738560       0.02412810       0.02411177       0.02488132       0.02325633
2       0.09831907       0.09716601       0.10068735       0.10061892       0.10558269       0.09731933       0.10214199       0.09958704       0.09558126       0.09632697       0.09500524       0.09658025       0.09735189       0.10395009
3       0.02701565       0.02634712       0.02664812       0.02726520       0.02653229       0.02614505       0.02566711       0.02659851       0.02825528       0.02993725       0.02632538       0.02650336       0.02737645       0.02536268
4       0.03716804       0.03574591       0.03761723       0.03676687       0.03550879       0.03422170       0.03638529       0.03591507       0.03943983       0.03941007       0.03730031       0.03648640       0.03744397       0.03468897
5       0.01472358       0.01480248       0.01538020       0.01563060       0.01498789       0.01539820       0.01404467       0.01449381       0.01496490       0.01671828       0.01489514       0.01428318       0.01465595       0.01439494
6       0.18288596       0.17577169       0.17874682       0.18250705       0.19242759       0.17379703       0.19244966       0.18614815       0.17979908       0.17404132       0.17316296       0.18136497       0.18543567       0.19098434
  pred_val_logit43 pred_val_logit44 pred_val_logit45 pred_val_logit46 pred_val_logit47 pred_val_logit48 pred_val_logit49 pred_val_logit50 pred_val_logit51 pred_val_logit52 pred_val_logit53 pred_val_logit54 pred_val_logit55 pred_val_logit56
1       0.02364355       0.02403589       0.02534964       0.02373755       0.02407277       0.02215790       0.02517823       0.02445729       0.02622694       0.02453822       0.02441424       0.02421247       0.02572925       0.02303592
2       0.10233164       0.09237258       0.10311460       0.10405512       0.09668289       0.10153286       0.10427609       0.09843630       0.09940567       0.09267169       0.10257091       0.10532697       0.09845568       0.10286751
3       0.02566833       0.02614090       0.02776315       0.02572118       0.02637545       0.02403363       0.02747908       0.02668335       0.02865336       0.02672869       0.02649110       0.02653165       0.02820947       0.02489206
4       0.03463672       0.03768843       0.03765013       0.03686841       0.03637227       0.03409584       0.03704319       0.03916239       0.03812348       0.03750088       0.03530483       0.03547409       0.03811208       0.03493836
5       0.01500763       0.01510518       0.01532012       0.01522805       0.01451807       0.01413808       0.01551636       0.01510149       0.01606728       0.01528732       0.01554076       0.01459240       0.01545509       0.01500765
6       0.18622437       0.16601368       0.18892382       0.18791435       0.18164788       0.18500889       0.19417653       0.18051970       0.17925362       0.16738468       0.18242898       0.19699190       0.18241835       0.18284731
  pred_val_logit57 pred_val_logit58 pred_val_logit59 pred_val_logit60 pred_val_logit61 pred_val_logit62 pred_val_logit63 pred_val_logit64 pred_val_logit65 pred_val_logit66 pred_val_logit67 pred_val_logit68 pred_val_logit69 pred_val_logit70
1       0.02587622       0.02562911       0.02439129       0.02614894       0.02278813       0.02417629       0.02694554       0.02562564       0.02523568       0.02632933       0.02658592       0.02584577       0.02485499       0.02445751
2       0.10582880       0.10518945       0.09769565       0.09987400       0.09969962       0.09982497       0.09896774       0.09839731       0.10153307       0.09426046       0.09181260       0.09864595       0.10178324       0.09919868
3       0.02823107       0.02800002       0.02656885       0.02842816       0.02465225       0.02619851       0.02944812       0.02796711       0.02738797       0.02876620       0.02949970       0.02804992       0.02716172       0.02684293
4       0.03630412       0.03551741       0.03573765       0.03748764       0.03631614       0.03585918       0.03783429       0.03836760       0.03822634       0.04036622       0.03480509       0.03831675       0.03572262       0.03639727
5       0.01597588       0.01570287       0.01519542       0.01646397       0.01475336       0.01550242       0.01647572       0.01579207       0.01604407       0.01612672       0.01493636       0.01643137       0.01520697       0.01460928
6       0.19602338       0.18973348       0.17868436       0.17463717       0.17666563       0.17844655       0.17770739       0.17839837       0.18104570       0.16862524       0.18154665       0.17102289       0.18687890       0.18660804
  pred_val_logit71 pred_val_logit72 pred_val_logit73 pred_val_logit74 pred_val_logit75 pred_val_logit76 pred_val_logit77 pred_val_logit78 pred_val_logit79 pred_val_logit80 pred_val_logit81 pred_val_logit82 pred_val_logit83 pred_val_logit84
1       0.02438140       0.02516797       0.02481553       0.02480673       0.02684468       0.02560864       0.02488180       0.02331207       0.02720007       0.02593105       0.02677487       0.02593961       0.02535573       0.02442389
2       0.09813283       0.09634442       0.09980836       0.09907536       0.09553926       0.10221478       0.09896969       0.09558748       0.10155173       0.09831294       0.09472502       0.10089334       0.10101565       0.10277533
3       0.02677385       0.02763456       0.02712846       0.02710248       0.02955939       0.02796698       0.02715106       0.02538703       0.02968355       0.02834926       0.02946388       0.02854792       0.02741447       0.02661490
4       0.03899230       0.03611161       0.03748821       0.03913814       0.03664257       0.03679770       0.03653512       0.03580028       0.03986320       0.03549590       0.03805021       0.03619544       0.03954621       0.03641650
5       0.01452027       0.01499617       0.01515221       0.01519780       0.01574012       0.01572378       0.01534843       0.01454491       0.01676474       0.01582686       0.01575486       0.01525582       0.01646267       0.01518217
6       0.18442021       0.17923081       0.18409804       0.17938908       0.17802562       0.18658136       0.18018546       0.17269518       0.18167675       0.17850907       0.17785814       0.19035779       0.17774865       0.18623461
  pred_val_logit85 pred_val_logit86 pred_val_logit87 pred_val_logit88 pred_val_logit89 pred_val_logit90 pred_val_logit91 pred_val_logit92 pred_val_logit93 pred_val_logit94 pred_val_logit95 pred_val_logit96 pred_val_logit97 pred_val_logit98
1       0.02656484       0.02389774       0.02441706       0.02494816       0.02551652       0.02379734       0.02446619       0.02583690       0.02442808       0.02364952       0.02590009       0.02516777       0.02368097       0.02484040
2       0.09791331       0.10335694       0.09811185       0.09946537       0.09980647       0.09787047       0.09304765       0.10374066       0.09455829       0.09796628       0.10598354       0.10218443       0.09834685       0.09536657
3       0.02907935       0.02597025       0.02672811       0.02708456       0.02804814       0.02583673       0.02670839       0.02843989       0.02669860       0.02577938       0.02842577       0.02734846       0.02587037       0.02722871
4       0.03787116       0.03446162       0.03710769       0.03661856       0.03647192       0.03504478       0.03889275       0.03645175       0.03762834       0.03785548       0.03592999       0.03435529       0.03626736       0.03751941
5       0.01609710       0.01508437       0.01480051       0.01583369       0.01511008       0.01510027       0.01505881       0.01518073       0.01493598       0.01467583       0.01546910       0.01588999       0.01451701       0.01494209
6       0.17814289       0.18858073       0.18066943       0.17633195       0.19078597       0.17522708       0.16851138       0.19693506       0.17247471       0.17704313       0.19598336       0.18649614       0.18304957       0.17763050
  pred_val_logit99 pred_val_logit100
1       0.02399963        0.02633093
2       0.10524509        0.10017293
3       0.02614965        0.02879715
4       0.03743954        0.03482690
5       0.01492853        0.01603679
6       0.18992416        0.18269133

 
 

Get 5th percentile lower-bound probability

 
 

Trainging predcition

# A tibble: 6 x 2
     id prob_log_lbA
  <int>        <dbl>
1     1       0.0311
2     4       0.0936
3     6       0.0371
4     8       0.0508
5    10       0.0326
6    11       0.0875

 

Validation prediction

# A tibble: 6 x 2
     id prob_log_lbA
  <int>        <dbl>
1     2       0.0246
2     3       0.0895
3     5       0.0267
4     7       0.0356
5     9       0.0155
6    12       0.158 

 
 

Compare Logistic model A performance with Logistic model B

 

Model performance using “zip_one”

   

Training

[1] “Based on our analysis, the number of customers Intuit should mail is 32,014 that is 60.98% of the customers.
The response rate for the selected customers is predicted to be 6.78%, or, 2,171 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $130,260.00; while actual margin is $149,880.00.
The expected profit is $85,120. The messaging cost is estimated to be $45,140 with a ROME of 1.89.”  

Validation

[1] “Based on our analysis, the number of customers Intuit should mail is 13,902 that is 61.79% of the customers.
The response rate for the selected customers is predicted to be 6.91%, or, 961 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $57,660.00; while actual margin is $66,180.00.
The expected profit is $38,058. The messaging cost is estimated to be $19,602 with a ROME of 1.94.”    

Model performance using “VI”

   

Training

[1] “Based on our analysis, the number of customers Intuit should mail is 30,293 that is 57.70% of the customers.
The response rate for the selected customers is predicted to be 7.05%, or, 2,137 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $128,220.00; while actual margin is $149,880.00.
The expected profit is $85,507. The messaging cost is estimated to be $42,713 with a ROME of 2.00.”  

Validation [1] “Based on our analysis, the number of customers Intuit should mail is 13,131 that is 58.36% of the customers.
The response rate for the selected customers is predicted to be 7.20%, or, 946 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $56,760.00; while actual margin is $66,180.00.
The expected profit is $38,245. The messaging cost is estimated to be $18,515 with a ROME of 2.07.”

   

Profit

 

   

ROME

 

       

Naive Bayes

Based on the result from the logistic regression model, we’ll use “VI”, “dollars”, “last”, “version1”, “owntaxprod”, “upgraded” in the Naive Bayes model.

 

Created with Laplace = 1 to avoid situation where the predictor predict 0 on unseen event

Naive Bayes Classifier
Data                 : train
Response variable    : res1
Levels               : Yes, No in res1
Explanatory variables: VI, numords, dollars, last, version1, owntaxprod, upgraded
Laplace              : 1
Nr obs               : 52,500 

A-priori probabilities:
res1
  Yes    No 
0.048 0.952 

Conditional probabilities (categorical) or means & st.dev (numeric):
     VI
res1  FALSE  TRUE
  Yes 0.787 0.213
  No  0.984 0.016

     numords
res1   mean st.dev
  Yes 2.568  1.436
  No  2.046  1.224

     dollars
res1     mean  st.dev
  Yes 117.000 103.241
  No   91.467  79.422

     last
res1    mean st.dev
  Yes 12.022  8.942
  No  16.048  9.536

     version1
res1   mean st.dev
  Yes 0.285  0.451
  No  0.210  0.407

     owntaxprod
res1   mean st.dev
  Yes 0.049  0.216
  No  0.028  0.164

     upgraded
res1   mean st.dev
  Yes 0.335  0.472
  No  0.201  0.401

 

Training Prediction  

Naive Bayes Classifier
Data                 : train 
Response variable    : res1 
Level(s)             : Yes, No in res1 
Explanatory variables: VI, numords, dollars, last, version1, owntaxprod, upgraded 
Prediction dataset   : train 
Rows shown           : 10 of 52,500 

    VI numords dollars last version1 owntaxprod upgraded   Yes    No
 FALSE       2 109.500    5        0          0        0 0.018 0.982
 FALSE       1  22.000   17        0          0        0 0.009 0.991
 FALSE       1  20.000   17        0          0        1 0.026 0.974
 FALSE       1  24.500    4        1          0        0 0.029 0.971
 FALSE       3  73.500   10        0          0        0 0.019 0.981
 FALSE       2  99.500    7        0          1        1 0.993 0.007
 FALSE       1  49.500   22        0          0        0 0.006 0.994
 FALSE       1  52.000   22        0          0        0 0.006 0.994
 FALSE       1  69.500   27        0          0        0 0.005 0.995
 FALSE       4 264.500   15        0          0        1 0.246 0.754

Validation Prediction
 

Naive Bayes Classifier
Data                 : train 
Response variable    : res1 
Level(s)             : Yes, No in res1 
Explanatory variables: VI, numords, dollars, last, version1, owntaxprod, upgraded 
Prediction dataset   : val 
Rows shown           : 10 of 22,500 

    VI numords dollars last version1 owntaxprod upgraded   Yes    No
 FALSE       1  69.500    4        0          0        0 0.014 0.986
 FALSE       4  93.000   14        0          0        1 0.079 0.921
 FALSE       1  24.500    2        0          0        0 0.016 0.984
 FALSE       1  49.500   13        1          0        0 0.020 0.980
 FALSE       1  44.500   15        0          0        0 0.009 0.991
 FALSE       5  79.000    5        0          0        1 0.196 0.804
 FALSE       1  38.000    5        1          0        0 0.027 0.973
 FALSE       2  40.500   10        0          0        0 0.013 0.987
 FALSE       2 105.500    9        0          0        0 0.015 0.985
 FALSE       2 136.000   27        0          0        0 0.007 0.993

   

Evaulate the performance of Naive Bayes Model

 
Training

[1] “Based on our analysis, the number of customers Intuit should mail is 21,502 that is 40.96% of the customers.
The response rate for the selected customers is predicted to be 8.86%, or, 1,905 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $114,300.00; while actual margin is $149,880.00.
The expected profit is $83,982. The messaging cost is estimated to be $30,318 with a ROME of 2.77.”  

Validation

[1] “Based on our analysis, the number of customers Intuit should mail is 9,276 that is 41.23% of the customers.
The response rate for the selected customers is predicted to be 9.10%, or, 844 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $50,640.00; while actual margin is $66,180.00.
The expected profit is $37,561. The messaging cost is estimated to be $13,079 with a ROME of 2.87.”        

Nerual Network Model

 

Due to the distinct ability of Nerual Network model to capture the complexity of the relationship and interactions between features and response variable, we decide to first feed the model with all the important features to see if the importance of variables deduced by Nerual Network model conforms to our conclusions made before.

 

Neural Network
Activation function  : Logistic (classification)
Data                 : intuit75k
Filter               : train <- training == 1
Response variable    : res1
Level                : Yes in res1
Explanatory variables: numords, dollars, last, sincepurch, version1, owntaxprod, upgraded, zip_one, VI 
Network size         : 2 
Parameter decay      : 0.5 
Seed                 : 1234 
Network              : 9-2-1 with 23 weights
Nr obs               : 52,500 
Weights              :
   b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1 i9->h1 
    1.29  -0.27  -0.20   0.52   0.02  -1.78  -0.08  -0.70  -0.10  -2.68 
   b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2 i9->h2 
   -1.40  -1.23   0.01   1.35  -0.13   2.11  -0.40  -1.54   0.32  -2.15 
   b->o h1->o h2->o 
   0.89 -4.74 -3.01  

The Olden plot shows that “dollars”, “last”, “sincepurch” and “zip_one” are relatively less important factor to consider when predicting purchase probabilities. This is aligned with our findings so we’ll build the Nerual Network model using the same variables as in logistics. We’ll use the same method to get the 5th lower-bound prediction as the purchase probability to lable customers as in the Logistic Regression models.

   

Bootstrap sample prediction on the training set

  id      trnn1      trnn2      trnn3      trnn4      trnn5      trnn6      trnn7      trnn8      trnn9     trnn10     trnn11     trnn12     trnn13     trnn14     trnn15     trnn16     trnn17     trnn18     trnn19     trnn20     trnn21     trnn22
1  1 0.03056845 0.03555882 0.03154480 0.03482594 0.02942755 0.03239208 0.03254379 0.02994816 0.02858426 0.03096297 0.03101345 0.03153061 0.02942230 0.03164288 0.03284576 0.02820377 0.03308329 0.03203990 0.03368561 0.03264220 0.03278554 0.03094488
2  4 0.01515693 0.01518377 0.01474121 0.01396486 0.01303368 0.01414603 0.01290520 0.01307065 0.01459073 0.01488323 0.01280862 0.01470844 0.01417473 0.01486062 0.01432369 0.01446864 0.01477383 0.01498237 0.01339287 0.01549035 0.01305473 0.01428503
3  6 0.04432302 0.04820826 0.04861266 0.04552652 0.05177538 0.04584605 0.04786150 0.04886100 0.05465430 0.04458509 0.04427276 0.04783514 0.05490525 0.05248175 0.04403792 0.05114751 0.05127221 0.04499102 0.04972441 0.05358336 0.05198805 0.04658951
4  8 0.05217575 0.04573376 0.04513357 0.04716904 0.05027743 0.04461655 0.04683238 0.04457962 0.04057391 0.04529554 0.04965399 0.04261595 0.04409941 0.04218243 0.04525841 0.04422806 0.04335192 0.04971193 0.04082573 0.04968094 0.05065133 0.04058117
5 10 0.03122532 0.03727811 0.03312804 0.03796901 0.03004342 0.03385746 0.03371564 0.03209299 0.02817365 0.03273792 0.03347122 0.03280904 0.02653537 0.03123775 0.03450424 0.02656418 0.03436497 0.03285720 0.03604917 0.03210879 0.03543793 0.03236554
6 11 0.10946139 0.07517193 0.10491860 0.10493253 0.08805213 0.07990543 0.09772434 0.09777240 0.08910189 0.11520451 0.11446325 0.10545570 0.07556250 0.09178360 0.10593985 0.10708437 0.13693278 0.11054531 0.10047697 0.08460489 0.10608465 0.12365925
      trnn23     trnn24     trnn25     trnn26     trnn27     trnn28     trnn29     trnn30     trnn31     trnn32     trnn33     trnn34     trnn35     trnn36     trnn37     trnn38     trnn39     trnn40     trnn41     trnn42     trnn43     trnn44
1 0.03077010 0.03258638 0.03402939 0.03164322 0.03366385 0.03437379 0.02990248 0.03088144 0.03103466 0.03010043 0.03143800 0.03178932 0.03127442 0.03003255 0.03069298 0.03306565 0.02961836 0.02872022 0.03188145 0.03217048 0.03278700 0.02708618
2 0.01588395 0.01508675 0.01476833 0.01564184 0.01649764 0.01429407 0.01355014 0.01344067 0.01507583 0.01486474 0.01506338 0.01458281 0.01380003 0.01287703 0.01336259 0.01640273 0.01531111 0.01293900 0.01441258 0.01315344 0.01338322 0.01592481
3 0.04382732 0.04965266 0.04530435 0.04344983 0.04798404 0.04678445 0.05324458 0.04869014 0.05040315 0.05186434 0.04949271 0.04709242 0.04619128 0.05282603 0.04429050 0.04453908 0.04436010 0.04459817 0.04257106 0.04973371 0.04719944 0.04875797
4 0.04281945 0.05477926 0.05006558 0.04901112 0.04580933 0.04642351 0.04458683 0.04223482 0.04500079 0.04027886 0.04388732 0.04034477 0.04810500 0.04341481 0.05767093 0.04831966 0.04452878 0.05028090 0.04623357 0.04177355 0.03774349 0.04248782
5 0.03173186 0.03358742 0.03482335 0.03311864 0.03462887 0.03642484 0.02684551 0.03206193 0.03217106 0.03011240 0.03290339 0.03367820 0.03302236 0.03093952 0.03131669 0.03280250 0.03048035 0.02922359 0.03310744 0.03414386 0.03552118 0.02682099
6 0.12636775 0.10741063 0.09912223 0.10774628 0.10395524 0.10140117 0.09217720 0.09456835 0.11809586 0.10923143 0.11269842 0.09582681 0.11470331 0.08264571 0.09271192 0.13459569 0.10949712 0.10777083 0.10858145 0.08324608 0.10014981 0.15215433
      trnn45     trnn46     trnn47     trnn48     trnn49     trnn50     trnn51     trnn52     trnn53     trnn54     trnn55     trnn56     trnn57     trnn58     trnn59     trnn60     trnn61     trnn62     trnn63     trnn64     trnn65     trnn66
1 0.03325865 0.03228667 0.03205828 0.03105531 0.03362680 0.03015567 0.03229598 0.02996165 0.02985849 0.02970134 0.03480507 0.02663917 0.03458925 0.03256555 0.03345142 0.03225133 0.02821758 0.03034434 0.03005434 0.03017309 0.02871995 0.02739069
2 0.01407331 0.01483875 0.01344690 0.01284776 0.01524536 0.01519733 0.01650272 0.01518879 0.01469979 0.01330812 0.01280315 0.01542449 0.01501590 0.01526331 0.01282318 0.01412470 0.01372277 0.01510995 0.01556678 0.01441936 0.01639268 0.01943881
3 0.05084926 0.04760608 0.04169290 0.04656716 0.04447221 0.04537543 0.04627230 0.04373133 0.04909093 0.05175154 0.04637782 0.04159442 0.05073475 0.05119695 0.04911400 0.05632066 0.04631622 0.05143734 0.05696204 0.05449331 0.04934853 0.04414257
4 0.04726998 0.04160888 0.04678452 0.03742572 0.04347723 0.04706759 0.05407493 0.04774190 0.04316064 0.04337522 0.04768959 0.04705801 0.04128829 0.04343140 0.04173475 0.04831426 0.04291267 0.03888782 0.04535297 0.04657168 0.04289884 0.04815002
5 0.03451934 0.03483334 0.03360170 0.03370554 0.03598966 0.03180459 0.03341011 0.03133056 0.03232990 0.03006973 0.03637840 0.03009419 0.03524951 0.03379217 0.03600006 0.03324831 0.03053359 0.03030021 0.02932835 0.02962614 0.02950076 0.02410189
6 0.09492696 0.10583276 0.10936657 0.11550684 0.09191867 0.10635957 0.13444607 0.08961214 0.11789436 0.07823643 0.11149488 0.10460627 0.08691435 0.09841285 0.10740551 0.07507597 0.13230595 0.08245547 0.10136179 0.09512566 0.08239463 0.11498935
      trnn67     trnn68     trnn69     trnn70     trnn71     trnn72     trnn73     trnn74     trnn75     trnn76     trnn77     trnn78     trnn79     trnn80     trnn81     trnn82     trnn83     trnn84     trnn85     trnn86     trnn87     trnn88
1 0.03216211 0.03395770 0.03032274 0.02656046 0.03210980 0.03222284 0.03217943 0.02742064 0.03400398 0.03153937 0.03266180 0.03018257 0.03391795 0.03386751 0.03234419 0.03461222 0.03278930 0.03243118 0.03085890 0.02906777 0.03000092 0.03297784
2 0.01290343 0.01472067 0.01383532 0.01483890 0.01501553 0.01380080 0.01548497 0.01471465 0.01437018 0.01431126 0.01377155 0.01501914 0.01663954 0.01404291 0.01482780 0.01438768 0.01516077 0.01412730 0.01507713 0.01535513 0.01396821 0.01369466
3 0.04201375 0.05580790 0.05242992 0.04580444 0.04535939 0.04693676 0.04538370 0.05194032 0.04619325 0.05184223 0.04756345 0.04454057 0.04958194 0.04935086 0.04020438 0.04895863 0.04943856 0.04918562 0.05708861 0.04350526 0.04954803 0.05145199
4 0.05186781 0.04663064 0.04314027 0.04856635 0.04927909 0.04368918 0.04859993 0.04780042 0.05127405 0.04492968 0.04094500 0.04437471 0.04943441 0.04348060 0.05634499 0.04440898 0.05003490 0.04732073 0.04429168 0.04079751 0.04634148 0.04318101
5 0.03320854 0.03467903 0.02873141 0.02761221 0.03334755 0.03308418 0.03358289 0.02813574 0.03450652 0.03354310 0.03422670 0.03157600 0.03504632 0.03495900 0.02582677 0.03576083 0.03498791 0.03414037 0.02823269 0.03110486 0.03131945 0.03435295
6 0.09762724 0.08825971 0.06093197 0.10880480 0.10815741 0.09406919 0.09306372 0.08663927 0.09110655 0.11884012 0.11612300 0.08971568 0.09937172 0.12671764 0.09852102 0.09821622 0.09707290 0.08437653 0.09894354 0.11832623 0.12101217 0.14200803
      trnn89     trnn90     trnn91     trnn92     trnn93     trnn94     trnn95     trnn96     trnn97     trnn98     trnn99    trnn100
1 0.03418912 0.03104064 0.03304026 0.03473297 0.03138782 0.03141967 0.03115403 0.03473169 0.03140489 0.03374033 0.02995231 0.03252547
2 0.01566914 0.01451029 0.01304637 0.01471823 0.01371036 0.01401920 0.01453998 0.01504461 0.01450837 0.01385750 0.01348681 0.01452901
3 0.04399392 0.04587868 0.05064937 0.05090705 0.04756370 0.04862836 0.05147610 0.04689068 0.04621500 0.04783325 0.05721093 0.04710874
4 0.04434810 0.04428637 0.04892955 0.04240259 0.04721870 0.04384085 0.04434793 0.04121917 0.04301554 0.04192032 0.04499316 0.04461941
5 0.03571272 0.03303706 0.03249423 0.03309621 0.03257182 0.03286076 0.03175409 0.03759508 0.03279286 0.03472803 0.03115466 0.03203346
6 0.10876140 0.11284538 0.12753597 0.09322596 0.09574514 0.10905187 0.13349757 0.10318592 0.11104747 0.10358711 0.07836963 0.12170499

 

Bootstrap sample predcition on the validation set

  id      vlnn1      vlnn2      vlnn3      vlnn4      vlnn5      vlnn6      vlnn7      vlnn8      vlnn9     vlnn10     vlnn11     vlnn12     vlnn13     vlnn14     vlnn15     vlnn16     vlnn17     vlnn18     vlnn19     vlnn20     vlnn21     vlnn22
1  2 0.02628076 0.02997000 0.02582887 0.02559066 0.02402371 0.02613369 0.02594159 0.02357499 0.02512625 0.02587473 0.02367226 0.02531988 0.02776712 0.02734300 0.02656765 0.02583900 0.02675010 0.02699526 0.02645755 0.02807889 0.02492284 0.02585108
2  3 0.09271819 0.09213025 0.07877154 0.08811843 0.08614261 0.08170480 0.08221156 0.10167134 0.09365375 0.08969898 0.08323263 0.09669850 0.08772222 0.09220628 0.08897939 0.08052270 0.08847554 0.08628625 0.08568370 0.09199577 0.07992205 0.08416455
3  5 0.02810009 0.03206981 0.02780016 0.02816363 0.02639398 0.02840389 0.02858208 0.02565085 0.02740287 0.02759510 0.02594926 0.02764211 0.03103979 0.02997277 0.02881709 0.02849801 0.02899102 0.02907924 0.02878280 0.03065015 0.02742588 0.02767956
4  7 0.02341829 0.02153186 0.02428057 0.02558994 0.02695570 0.02288243 0.02384770 0.02206810 0.02521716 0.02412920 0.02828286 0.02456912 0.02591555 0.02595626 0.02322385 0.02794993 0.02450656 0.02296205 0.02412875 0.02676568 0.02634699 0.02209525
5  9 0.01675887 0.01737026 0.01623289 0.01529106 0.01428944 0.01565895 0.01444745 0.01438202 0.01578564 0.01648139 0.01406815 0.01590932 0.01547843 0.01629393 0.01591590 0.01566465 0.01633205 0.01663453 0.01508660 0.01698639 0.01443616 0.01596222
6 12 0.15542802 0.13358745 0.11398652 0.13687321 0.12406926 0.13859771 0.12802348 0.18265633 0.15907026 0.14008490 0.13232111 0.12806024 0.14856814 0.14134436 0.14005341 0.12872545 0.13398002 0.13917769 0.12008303 0.14606123 0.11033006 0.12665956
      vlnn23     vlnn24     vlnn25     vlnn26     vlnn27     vlnn28     vlnn29     vlnn30     vlnn31     vlnn32     vlnn33     vlnn34     vlnn35     vlnn36     vlnn37     vlnn38     vlnn39     vlnn40     vlnn41     vlnn42     vlnn43     vlnn44
1 0.02639870 0.02769522 0.02851187 0.02683817 0.02836340 0.02782821 0.02771161 0.02518485 0.02601988 0.02597011 0.02673611 0.02610003 0.02581297 0.02419690 0.02532368 0.02802989 0.02531261 0.02349277 0.02617165 0.02629824 0.02528710 0.02390897
2 0.08635599 0.08975768 0.09578737 0.09278980 0.08220172 0.08423409 0.08955112 0.08411549 0.08728100 0.08743687 0.09740851 0.08673818 0.09248042 0.08688928 0.09372889 0.09183032 0.08714741 0.09773682 0.09025844 0.09414825 0.08807900 0.07626028
3 0.02808478 0.02967297 0.03090076 0.02858478 0.03043180 0.03010112 0.03108129 0.02727403 0.02787171 0.02834659 0.02842323 0.02799350 0.02773727 0.02662895 0.02765467 0.03046883 0.02697905 0.02565448 0.02832178 0.02835030 0.02744923 0.02566691
4 0.02186327 0.02811020 0.02413905 0.02385045 0.02387828 0.02378345 0.02551742 0.02280550 0.02394849 0.02522943 0.02342254 0.02116940 0.02455185 0.02216572 0.02810542 0.02679126 0.02302497 0.02821277 0.02229749 0.02124192 0.02213795 0.02526674
5 0.01744027 0.01690578 0.01656617 0.01729061 0.01814040 0.01614275 0.01495732 0.01499695 0.01660710 0.01614452 0.01684638 0.01620698 0.01550989 0.01416948 0.01484241 0.01774498 0.01675532 0.01422825 0.01598007 0.01499112 0.01492059 0.01686187
6 0.13718251 0.13313012 0.15944703 0.15101746 0.12211723 0.12521067 0.15247715 0.12260184 0.12892199 0.13678863 0.17671272 0.12874962 0.13620969 0.12812933 0.16308142 0.10778555 0.13934294 0.19969261 0.15177595 0.13876633 0.13693625 0.14050116
      vlnn45     vlnn46     vlnn47     vlnn48     vlnn49     vlnn50     vlnn51     vlnn52     vlnn53     vlnn54     vlnn55     vlnn56     vlnn57     vlnn58     vlnn59     vlnn60     vlnn61     vlnn62     vlnn63     vlnn64     vlnn65     vlnn66
1 0.02693213 0.02592995 0.02613634 0.02380692 0.02743892 0.02448898 0.02791040 0.02535974 0.02365407 0.02444242 0.02689811 0.02123435 0.02864576 0.02729132 0.02575267 0.02620255 0.02228777 0.02637850 0.02606122 0.02580954 0.02451165 0.03046702
2 0.08783714 0.09041584 0.09216814 0.08378164 0.09665465 0.08926922 0.09885182 0.08982583 0.09490492 0.09630134 0.08586623 0.10733172 0.09510642 0.09611038 0.08414569 0.08887921 0.09059048 0.08532234 0.08661264 0.08904983 0.08926319 0.07398367
3 0.02931510 0.02782501 0.02835056 0.02585849 0.02940116 0.02641101 0.02958658 0.02700130 0.02553094 0.02692601 0.02969812 0.02237307 0.03147216 0.02928312 0.02821946 0.02867618 0.02409299 0.02869439 0.02848525 0.02835097 0.02623859 0.03291906
4 0.02457851 0.02285176 0.02691459 0.02147800 0.02304282 0.02700882 0.02753674 0.02343555 0.02505925 0.02400009 0.02848758 0.02787134 0.02331517 0.02232890 0.02345589 0.02558108 0.02548360 0.02423944 0.02701634 0.02680367 0.02611899 0.03038204
5 0.01572599 0.01637368 0.01506284 0.01425367 0.01698082 0.01628936 0.01824648 0.01667688 0.01574545 0.01452923 0.01448116 0.01617627 0.01659733 0.01694637 0.01443203 0.01555639 0.01472439 0.01643758 0.01673206 0.01567521 0.01736716 0.02070581
6 0.13064660 0.13645880 0.18466935 0.12345506 0.15384245 0.13002440 0.15837082 0.14127097 0.11823478 0.16117372 0.12696795 0.14560224 0.14921099 0.14896286 0.12059137 0.12512968 0.10795717 0.13187468 0.10515683 0.13606813 0.14966836 0.15514037
      vlnn67     vlnn68     vlnn69     vlnn70     vlnn71     vlnn72     vlnn73     vlnn74     vlnn75     vlnn76     vlnn77     vlnn78     vlnn79     vlnn80     vlnn81     vlnn82     vlnn83     vlnn84     vlnn85     vlnn86     vlnn87     vlnn88
1 0.02494592 0.02888945 0.02721838 0.02227167 0.02751301 0.02654953 0.02694154 0.02292860 0.02844769 0.02530868 0.02597013 0.02548784 0.02894876 0.02776558 0.03396584 0.02908953 0.02639905 0.02692646 0.02906019 0.02367104 0.02437775 0.02657998
2 0.08299567 0.08631103 0.09234803 0.08073799 0.09003267 0.08400277 0.09102151 0.09448900 0.09214434 0.09364719 0.08962359 0.08701365 0.09644782 0.08627712 0.08781973 0.09259808 0.09306518 0.09590619 0.08499848 0.09589372 0.08753062 0.08639423
3 0.02784666 0.03107571 0.03033957 0.02406735 0.02929191 0.02882741 0.02886446 0.02476278 0.03094292 0.02743746 0.02827360 0.02714542 0.03087368 0.03016397 0.03946886 0.03129738 0.02847108 0.02891231 0.03201939 0.02530147 0.02645417 0.02890841
4 0.02648568 0.02671621 0.02648819 0.02912035 0.02368000 0.02305938 0.02275323 0.02761061 0.02411370 0.02304079 0.02158639 0.02427864 0.02744590 0.02358705 0.02931434 0.02041510 0.02827307 0.02494181 0.02633809 0.02442302 0.02448451 0.02351774
5 0.01416755 0.01670702 0.01522428 0.01562940 0.01691880 0.01549225 0.01709109 0.01565825 0.01619409 0.01561888 0.01532968 0.01654936 0.01845533 0.01586025 0.01651328 0.01651699 0.01662601 0.01595342 0.01665597 0.01637089 0.01518796 0.01537131
6 0.14200240 0.14747769 0.15087037 0.16867520 0.13922049 0.12773104 0.14432917 0.12126474 0.14737693 0.13323616 0.13709849 0.13120098 0.13948600 0.12974331 0.11767273 0.13233277 0.16437832 0.13919548 0.14608090 0.14271817 0.13041081 0.12210121
      vlnn89     vlnn90     vlnn91     vlnn92     vlnn93     vlnn94     vlnn95     vlnn96     vlnn97     vlnn98     vlnn99    vlnn100
1 0.02810574 0.02567430 0.02784941 0.03025843 0.02554068 0.02570497 0.02541590 0.02813368 0.02612519 0.02838363 0.02411587 0.02782791
2 0.08608713 0.09425549 0.08059666 0.09525928 0.08197991 0.08457087 0.09825485 0.09471985 0.08527844 0.08164522 0.08928468 0.09207923
3 0.03032303 0.02744373 0.03078320 0.03354816 0.02774601 0.02773174 0.02779568 0.03016590 0.02803489 0.03053738 0.02640059 0.03061706
4 0.02210845 0.02313926 0.02619442 0.02363222 0.02409012 0.02315283 0.02419291 0.02056053 0.02253267 0.02417382 0.02390241 0.02536695
5 0.01734782 0.01611782 0.01477490 0.01646831 0.01523572 0.01559828 0.01575804 0.01693214 0.01616770 0.01598217 0.01470546 0.01605116
6 0.14470794 0.13708843 0.10672944 0.15400308 0.12625363 0.12371341 0.14369842 0.13791204 0.12862037 0.11153468 0.13186793 0.13318916

 

Get 5th percentile lower-bound probability

 

Trainging prediction

# A tibble: 6 x 2
     id pred_nb
  <int>   <dbl>
1     1 0.0176 
2     4 0.00857
3     6 0.0261 
4     8 0.0289 
5    10 0.0188 
6    11 0.993  

 
Validation Prediction

# A tibble: 6 x 2
     id pred_nb
  <int>   <dbl>
1     2 0.0143 
2     3 0.0793 
3     5 0.0155 
4     7 0.0200 
5     9 0.00919
6    12 0.196  

   

Evaulate the Performance of Neural Networks Model

Trainging predcition
[1] “Based on our analysis, the number of customers Intuit should mail is 26,920 that is 51.28% of the customers.
The response rate for the selected customers is predicted to be 7.73%, or, 2,082 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $124,920.00; while actual margin is $149,880.00.
The expected profit is $86,963. The messaging cost is estimated to be $37,957 with a ROME of 2.29.”  

Validation predcition [1] “Based on our analysis, the number of customers Intuit should mail is 11,711 that is 52.05% of the customers.
The response rate for the selected customers is predicted to be 7.76%, or, 909 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $54,540.00; while actual margin is $66,180.00.
The expected profit is $38,027. The messaging cost is estimated to be $16,513 with a ROME of 2.30.”      


PART III Performance Evaluation for All Models

 

Now that we’ve already built four models, it’s time to review their performance altogether. We’ll first compare profits and ROME for both training and validation data, then visualize the lift and gains under different models to evaluate the efficiency; lastly, we’ll compare the models’ AUC score and construct the confursion matrix.
 

 

Profit - training set

 

The predicted training profit is highest under NN model, leading the second highest profit which is under Logistics model by 1.70%. However, we think the outcome might because of NN model’s strong learning ability in the training data and there could potentially be an ‘overfit’ compared to logistics model. The validation outcome can speak more on this matter.

 

 

Profit - validation set

 

Looking at the validation prediction result, NN is still performing well. But this time, profit under NN is surpassed by logistic model by 0.57%. Based on such result, we think logistic model can generalize the data better hence the best model to use as our final model in targeting clients.

 

 
 

ROME - Training set

 

Under both training set and validation set, Naive Nayes model outperforms all the other models in return on marketing expenditure; however, considering its performance in predicting profits, we cannot prefer this model over logistics in term of targeting customers.

 

 

ROME - Validation set

 

   

Lift and Gains

From the lift chart, we see that the machine learning models are much more efficient predicting and targeting purchase than non-machine-learning model, Sequential RFM. The 3 machine-learning models almost make no differences in the lift and gains charts, but we can still see that the Logistic model and NN model are able to gain more responses than other models when targeting the same percentage of customers. It suggests that using these two models can help Intuit generate more profit with less budget exhausted.

 

   

Confusion matrix

 

Investigating the confusion matrix, we can figure out how logistic model surpassed NN when predicted on the validation data. As NN model may learn the pattern in the training data too well, it inherited a more strict standard to judge a customer’s purchase probability; therefore, it would give many customers lower probabilities compared to logistic models, who instead generalized the data and judge the customers more liberal. Due to this reason, NN has lower FP and higher TNR while LR has higher FP but lower TNR. In our case, the mail cost is extremely low compared to the high margin, so it costs Intuit virtually nothing to send out a bit more mail but costs a lot if missed a potential buyer.

 

Based on the confusion matrix, we reconsolidated our confidence that LR model should be the best one to use in the final prediction.

 

Confusion matrix
Data       : new_intuit 
Filter     : training == 1 
Results for: Both 
Predictors : rfm_resp, prob_log_lbB, pred_nb, prob_nn_lb1 
Response   : res1 
Level      : Yes in res1 
Cost:Margin: 1.41 : 60 

     Type    Predictor    TP     FP     TN  FN  total   TPR   TNR precision Fscore
 Training     rfm_resp 2,312 40,530  9,472 186 52,500 0.926 0.189     0.054  0.102
 Training prob_log_lbB 2,137 28,156 21,846 361 52,500 0.855 0.437     0.071  0.130
 Training      pred_nb 1,905 19,597 30,405 593 52,500 0.763 0.608     0.089  0.159
 Training  prob_nn_lb1 2,082 24,838 25,164 416 52,500 0.833 0.503     0.077  0.142
     Test     rfm_resp 1,022 17,166  4,231  81 22,500 0.927 0.198     0.056  0.106
     Test prob_log_lbB   946 12,185  9,212 157 22,500 0.858 0.431     0.072  0.133
     Test      pred_nb   844  8,432 12,965 259 22,500 0.765 0.606     0.091  0.163
     Test  prob_nn_lb1   909 10,802 10,595 194 22,500 0.824 0.495     0.078  0.142

     Type    Predictor accuracy kappa profit index  ROME contact   AUC
 Training     rfm_resp    0.224 0.013 78,313 0.901 1.296   0.816 0.664
 Training prob_log_lbB    0.457 0.047 85,507 0.983 2.002   0.577 0.765
 Training      pred_nb    0.615 0.080 83,982 0.966 2.770   0.410 0.746
 Training  prob_nn_lb1    0.519 0.060 86,963 1.000 2.291   0.513 0.772
     Test     rfm_resp    0.233 0.015 35,675 0.933 1.391   0.808 0.680
     Test prob_log_lbB    0.451 0.047 38,245 1.000 2.066   0.584 0.764
     Test      pred_nb    0.614 0.082 37,561 0.982 2.872   0.412 0.743
     Test  prob_nn_lb1    0.511 0.057 38,027 0.994 2.303   0.520 0.765

       


PART IV - Re-evaluate Model Performance with Lower Projected Response Rate

   

As we know that in Wave2, customers will be less likely to response than in Wave1. To make sure we have picked the best model, we will re-evaluate the 4 models using a higher cut-off to lable whether they’ll response or not. The new cut-off is double the break-even response rate used in Part I~III to reflect the 50% reduction of response rate. The new cutoff is 0.05.

   

Sequential RFM model

   

revised profitable rfm-id

 [1] "142" "354" "354" "154" "224" "232" "453" "452" "552" "311"

   

Evaulate RFM performance

 

Training [1] “Based on our analysis, the number of customers Intuit should mail is 52,500 that is 100.00% of the customers.
The response rate for the selected customers is predicted to be 4.76%, or, 2,498 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $149,880.00; while actual margin is $149,880.00.
The expected profit is $75,855. The messaging cost is estimated to be $74,025 with a ROME of 1.02.”  

Validation [1] “Based on our analysis, the number of customers Intuit should mail is 22,500 that is 100.00% of the customers.
The response rate for the selected customers is predicted to be 4.90%, or, 1,103 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $66,180.00; while actual margin is $66,180.00.
The expected profit is $34,455. The messaging cost is estimated to be $31,725 with a ROME of 1.09.”      

Logistics Regression

To be prudent, we re-do the process in previous part to see if “VI” is indeed a better predictor than “zip_one”.

Performance using ‘zip_one’

 

Training

[1] “Based on our analysis, the number of customers Intuit should mail is 14,812 that is 28.21% of the customers.
The response rate for the selected customers is predicted to be 10.82%, or, 1,602 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $96,120.00; while actual margin is $149,880.00.
The expected profit is $75,235. The messaging cost is estimated to be $20,885 with a ROME of 3.60.”  

Validation

[1] “Based on our analysis, the number of customers Intuit should mail is 6,413 that is 28.50% of the customers.
The response rate for the selected customers is predicted to be 11.40%, or, 731 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $43,860.00; while actual margin is $66,180.00.
The expected profit is $34,818. The messaging cost is estimated to be $9,042 with a ROME of 3.85.”    

Performance using ‘VI’

 

Training

[1] “Based on our analysis, the number of customers Intuit should mail is 13,505 that is 25.72% of the customers.
The response rate for the selected customers is predicted to be 11.66%, or, 1,575 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $94,500.00; while actual margin is $149,880.00.
The expected profit is $75,458. The messaging cost is estimated to be $19,042 with a ROME of 3.96.”  

Validation

[1] “Based on our analysis, the number of customers Intuit should mail is 5,867 that is 26.08% of the customers.
The response rate for the selected customers is predicted to be 12.25%, or, 719 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $43,140.00; while actual margin is $66,180.00.
The expected profit is $34,868. The messaging cost is estimated to be $8,272 with a ROME of 4.21.”  

Compare Profit and ROME

 

From the profit comparison chart we can see that “VI” is a better predictor in the logistic model to provide higher profit than “zip_one”, even when response rate decrease. This time, even ROME is much higher using “VI” as the predictor to target customers.

Profit

   

ROME

   

Naive Bayes

Performance

 

Training

[1] “Based on our analysis, the number of customers Intuit should mail is 11,092 that is 21.13% of the customers.
The response rate for the selected customers is predicted to be 12.52%, or, 1,389 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $83,340.00; while actual margin is $149,880.00.
The expected profit is $67,700. The messaging cost is estimated to be $15,640 with a ROME of 4.33.”  

Validation

[1] “Based on our analysis, the number of customers Intuit should mail is 4,836 that is 21.49% of the customers.
The response rate for the selected customers is predicted to be 12.76%, or, 617 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $37,020.00; while actual margin is $66,180.00.
The expected profit is $30,201. The messaging cost is estimated to be $6,819 with a ROME of 4.43.”      

Nerual Networks Model

 

Evaulate the performance of Neural Networks model

 

Training

[1] “Based on our analysis, the number of customers Intuit should mail is 11,873 that is 22.62% of the customers.
The response rate for the selected customers is predicted to be 12.70%, or, 1,508 buyers; while the actual response rate is 4.76%, or, 2,498.
The predicted margin is $90,480.00; while actual margin is $149,880.00.
The expected profit is $73,739. The messaging cost is estimated to be $16,741 with a ROME of 4.40.”  

validation

[1] “Based on our analysis, the number of customers Intuit should mail is 5,201 that is 23.12% of the customers.
The response rate for the selected customers is predicted to be 12.98%, or, 675 buyers; while the actual response rate is 4.90%, or, 1,103.
The predicted margin is $40,500.00; while actual margin is $66,180.00.
The expected profit is $33,167. The messaging cost is estimated to be $7,333 with a ROME of 4.52.”        

Performance re-Evaluation for All Models

Let’s review again the performance altogether. We’ll first compare profits and ROME for both training and validation data, then visualize the lift and gains under different models to evaluate the efficiency; lastly, we’ll compare the models’ AUC score and construct the confursion matrix.

 

Profit - training set

 

The predicted highest training profit is now under sequential RFM model, leading the second highest profit which is under logistic model by 0.53%. But what we really care about is the performance in the validation set. The predicted highest validation profit is still under logistic model.

 

Profit - validation set

Lift and Gains

 

From the lift chart, we’re now able to see a distinct advantage of logistic and NN model over other models. Sequential RFM. Also in the gains charts, the Logistic model and NN model are able to gain more purchases than other models when targeting the same proportion of customers. It suggests that using these two models can help Intuit generate more profit with less budget exhausted.

   

Confusion matrix

 

Investigating the confusion matrix at the new cut-off, we can still see the similar prediction pattern of how logistic model surpassed NN when predicted on the validation data.  

Based on the confusion matrix, we reconsolidated our confidence that LR model is the best one to use in the final prediction.
 

Confusion matrix
Data       : new_intuit2 
Filter     : training == 1 
Results for: Both 
Predictors : rfm_resp, prob_log_lbB, pred_nb, prob_nn_lb1 
Response   : res1 
Level      : Yes in res1 
Cost:Margin: 1.41 : 60 

     Type    Predictor    TP     FP     TN  FN  total   TPR   TNR precision Fscore
 Training     rfm_resp 2,312 40,530  9,472 186 52,500 0.926 0.189     0.054  0.102
 Training prob_log_lbB 2,137 28,156 21,846 361 52,500 0.855 0.437     0.071  0.130
 Training      pred_nb 1,905 19,597 30,405 593 52,500 0.763 0.608     0.089  0.159
 Training  prob_nn_lb1 2,082 24,838 25,164 416 52,500 0.833 0.503     0.077  0.142
     Test     rfm_resp 1,022 17,166  4,231  81 22,500 0.927 0.198     0.056  0.106
     Test prob_log_lbB   946 12,185  9,212 157 22,500 0.858 0.431     0.072  0.133
     Test      pred_nb   844  8,432 12,965 259 22,500 0.765 0.606     0.091  0.163
     Test  prob_nn_lb1   909 10,802 10,595 194 22,500 0.824 0.495     0.078  0.142

     Type    Predictor accuracy kappa profit index  ROME contact   AUC
 Training     rfm_resp    0.224 0.013 78,313 0.901 1.296   0.816 0.664
 Training prob_log_lbB    0.457 0.047 85,507 0.983 2.002   0.577 0.765
 Training      pred_nb    0.615 0.080 83,982 0.966 2.770   0.410 0.746
 Training  prob_nn_lb1    0.519 0.060 86,963 1.000 2.291   0.513 0.772
     Test     rfm_resp    0.233 0.015 35,675 0.933 1.391   0.808 0.680
     Test prob_log_lbB    0.451 0.047 38,245 1.000 2.066   0.584 0.764
     Test      pred_nb    0.614 0.082 37,561 0.982 2.872   0.412 0.743
     Test  prob_nn_lb1    0.511 0.057 38,027 0.994 2.303   0.520 0.765

     


PART V - Projecting Total Profit and Customer Selection

 

Based on the exploratory data analysis and the modeling process, we decide to use the prediction result of Logistic Regression B from PART VI as the guideline to target customers.

Intuit has total customer of 801,821, of which 38,487 has responded in Wave one, leaving Wave 2 total un-responded customer of 763,334. In the validation set, we have 21,397 un-responded customer and we’ll mail to 4,764 of them, or, 22.26% of all un-responded customer in the validation set. That means, scaling to the full set of un-responded customers, we will mailto 169,955 of them. The predicted validation profit is $34,867.53, that is $5.94 per wave2 mailto customer estimated by our best model. Therefore, multipling it by the projected total mail-to customer of 169,955, we get the scaled total profit of $1,010,039.99.

     


PART VI - Businesses Types That Are More Likely To Upgrade

 

First, looking at the results from our best model, all the features are statistically significant. “VI”, “numords”, “version1”, “owntaxprod” as well as “upgraded”, can significantly affect customers’ response probability. To be more specific, we think each feature can impact the response in the following ways:

Second, the olden plot indicates that the variables described above are important to the response probability. More clearly, “VI” shows the greatest importance in the plot and the importance level is degressive in sequence of “upgraded”, “numords”, “version 1” and “owntaxprod”.

Overall, we conclude that the businesses which are more likely to upgrade might be located in Virginia, currently use version 1 or have upgraded from version 1 to version 2, purchased tax software and may have placed a large number of orders previously.